Shot location has a considerable impact on the probability of a goal. Any xG model, shooting % heatmap, or the adage “go to the net if you want to score” will support that. But defenders rightfully make getting to the front of the net difficult, so offensive players often have to make internal calculations with respect to their location on the ice, defensive positioning, teammate support, and possibly things like score, strength, and fatigue. They must consider the trade-off between shot distance and shot angle, since taking a direct line to the front of the net is rarely an option.
So an interesting and useful question might be at any given point in the offensive zone, what direction increases the probability of a goal most? If a player has the puck at the offensive zone faceoff dot are they better off trying to get closer to the net at the same angle, or getting toward the middle of the ice at without decreasing the distance to the net?
Thankfully, we can borrow concepts from math class to help answer these questions. Differential equations from high school math have found a home in machine learning, known as gradient descent, where some model error in a multi-variate function repeatedly minimized until we’re fairly certainly the model parameters represent a minimum error. The gradient is just the first-order derivative of the function, and you might remember from math class, represents the rate of change that our metric of interest is under going at any given point with respect to the rate of change in the input variables.
Applied to hockey we’d just want to calculate the derivative of the probability of a goal being scored with respect to moving in the north-south or east-west direction. We are also interested in find a maximum (shooting %), so will be using the gradient ascent, but the concpet is the same.
First, we need a good idea of how the probability of a goal is impacted by shot location. To do this, we can use the x, y coordinates from the NHL PBP data from 2010 - 18 and simply calculate the shooting % at each point. Looking at just offensive zone shots with more than 25 occurances, we see a ‘mountain’ right in front of the net. Using this visual, whereever we find ourselves on the map, we want to ‘climb the mountain’ by heading in the direction where we will see the steepest ascent.
goal_df <- goal_surface_df %>%
filter(Sample > 25) %>% ## Remove low volume shooting locations
filter(X > 25) ## Offensive zone shots only
Goal_Prob_mat <- goal_df %>%
data.table::dcast(X ~ Y, value.var = "Goal_Prob") %>%
select(-c(X)) %>%
as.matrix()
plotly::plot_ly(
z = Goal_Prob_mat,
colors = "YlOrRd"
) %>%
add_surface()
The mountain
round_param <- 3
goal_df_rounded <- goal_df %>%
ungroup() %>%
mutate(X = as.double(round_param * floor(X / round_param)),
Y = as.double(round_param * floor(Y / round_param)),
Goal_Prob = as.double(Goal_Prob)
) %>%
group_by(X, Y) %>%
summarise(Goal_Prob = mean(Goal_Prob, na.rm=T)) %>%
ungroup() %>%
mutate(X = as.double(X),
Y = as.double(Y),
Goal_Prob = as.double(Goal_Prob))
goal_contour <- rink +
geom_contour(data = goal_df_rounded, aes(x=X, y=Y, z=Goal_Prob, colour = calc(level)), bins = 8) +
scale_color_distiller( palette = "YlOrRd", trans = "reverse", labels = scales::percent) +
labs(color = "Shooting %")
goal_contour
model_info <- summary(model)
derivative_strings <- data.frame(term = rownames(model_info$coefficients),
model_info$coefficients) %>%
#filter(term != "(Intercept)") %>%
mutate(len = nchar(as.character(term)),
X_pow = (as.numeric(substr(term, len - 2, len - 2))),
Y_pow = (as.numeric(substr(term, len, len))),
X_pow_prime = X_pow - 1,
Y_pow_prime = Y_pow - 1,
f_value = ifelse(is.na(X_pow) | is.na(Y_pow),paste0(Estimate),
ifelse(X_pow > 0 & Y_pow > 0,paste0(Estimate, " * X**",X_pow, " * Y**",Y_pow),
ifelse(X_pow > 0 ,paste0(Estimate, " * X**",X_pow),
ifelse(Y_pow > 0 ,paste0(Estimate, " * Y**",Y_pow),
paste0(Estimate))))),
df_dx = ifelse(X_pow_prime >= 0 & Y_pow > 0,paste0(X_pow * Estimate, " * X**",X_pow_prime," * Y**",Y_pow),
ifelse(X_pow_prime >= 0,paste0(X_pow * Estimate, " * X**",X_pow_prime),
NA)),
df_dy = ifelse(Y_pow_prime >= 0 & X_pow > 0,paste0(Y_pow * Estimate, " * Y**",Y_pow_prime," * X**",X_pow),
ifelse(Y_pow_prime >= 0,paste0(Y_pow * Estimate, " * Y**",Y_pow_prime),
NA))
)
## Warning in evalq((as.numeric(substr(term, len - 2, len - 2))),
## <environment>): NAs introduced by coercion
## Warning in evalq((as.numeric(substr(term, len, len))), <environment>): NAs
## introduced by coercion
args <- "X, Y"
function_str <- derivative_strings %>% select(f_value) %>% na.omit() %>% group_by() %>% summarise(f = paste0(f_value, collapse = " + "))
eval(parse(text = paste('f_xy <- function(', args, ') { return(' , function_str , ')}', sep='')))
function_df_dx <- derivative_strings %>% select(df_dx) %>% na.omit() %>% group_by() %>% summarise(df_dx = paste0(df_dx, collapse = " + "))
eval(parse(text = paste('df_dx <- function(', args, ') { return(' , function_df_dx , ')}', sep='')))
function_df_dy <- derivative_strings %>% select(df_dy) %>% na.omit() %>% group_by() %>% summarise(df_dy = paste0(df_dy, collapse = " + "))
eval(parse(text = paste('df_dy <- function(', args, ') { return(' , function_df_dy , ')}', sep='')))
goal_jacobin <- goal_df %>%
mutate(f = f_xy(X, Y),
dfdx = df_dx(X, Y),
dfdy = df_dy(X, Y),
X_end = X + (dfdx*30),
Y_end = Y + (dfdy*30)
)
goal_contour +
geom_segment(data = goal_jacobin, aes(x=X, xend=X_end, y=Y, yend=Y_end, color = f), position = "identity",
arrow=arrow(length = unit(0.15, "cm"))
) +
#scale_color_continuous(labels = scales::percent) +
labs(color = "Shooting %")